Clustering for high availability and disaster recovery

ABSTRACT

Embodiments are directed towards managing within a cluster environment having a plurality of indexers for data storage using redundancy the data being managed using a generation identifier, such that a primary indexer is designated for a given generation of data. When a master device for the cluster fails, data may continue to be stored using redundancy, and data searches performed may still be performed.

CROSS-REFERENCE TO RELATED APPLICATION

This application is a continuation patent application of patentapplication Ser. No. 13/648,116 filed on Oct. 9, 2012, which claimspriority under 35 U.S.C. §119(e) and 37 C.F.R. §1.78 of U.S. ProvisionalPatent Application Ser. No. 61/647,245, entitled “Clustering for HighAvailability and Disaster Recovery,” filed on May 15, 2012, which isincorporated herein by reference in its entirety.

TECHNICAL FIELD

The disclosed embodiments relate generally to managing data storage andrecovery and, more particularly, but not exclusively, to managing withina cluster environment data storage using data replication and generationidentifiers for data storage and recovery.

BACKGROUND

Today's Internet has evolved into a ubiquitous network that hascompelled many businesses to rely upon it as a major resource for doingbusiness. For example, many businesses may employ services on theInternet to perform backup of their various aspects of their localcomputing resources, including providing for high availability of backedup data.

In response to the need to provide for a networking infrastructure withboth high availability of data and recover from disasters, clusterarchitectures were developed. Briefly, a cluster architecture can bedefined as multiple loosely coupled network devices that cooperate toprovide client devices access to one or more services over the network.

However, some cluster architectures that are employed for data backupmay spread different portions of data across a large number of memberswithin the cluster to minimize the likelihood of loss of large amountsof data should one of the members fail. However, when loss of even aportion of the data may be significant to the clients depending upon thecluster, this may not be a tolerable result. Therefore, It is withrespect to these considerations and others that the present inventionhas been made.

BRIEF DESCRIPTION OF THE DRAWINGS

Non-limiting and non-exhaustive embodiments are described with referenceto the following drawings. In the drawings, like reference numeralsrefer to like parts throughout the various figures unless otherwisespecified.

For a better understanding of the present embodiments, reference will bemade to the following Detailed Description, which is to be read inassociation with the accompanying drawings, in which:

FIG. 1 illustrates a system diagram of one embodiment of an environmentin which the embodiments may be practiced;

FIG. 2 illustrates one possible embodiment of a client device usablewithin the environment of FIG. 1;

FIG. 3 illustrates one possible embodiment of a network device usable bya content provider within the environment of FIG. 1;

FIG. 4 illustrates one non-limiting, non-exhaustive example of managingredundant data backup and recovery across a plurality of member indexerswithin a cluster;

FIG. 5 illustrates one embodiment of a signal flow usable to manageredundant (replication) of data across multiple member indexers within acluster;

FIG. 6 illustrates a flow chart of one embodiment of a process usable tomanage redundant (replication) of data across multiple member indexerswithin a cluster; and

FIG. 7 illustrates non-limiting, non-exhaustive examples of managing arequest for data during a member indexer failure within a cluster.

DETAILED DESCRIPTION

The present embodiments now will be described more fully hereinafterwith reference to the accompanying drawings, which form a part hereof,and which show, by way of illustration, specific aspects in which theembodiments may be practiced. These embodiments may, however, take manydifferent forms and should not be construed as limited to theembodiments set forth herein; rather, these embodiments are provided sothat this disclosure will be thorough and complete, and will fullyconvey the scope to those skilled in the art. Among other things, thepresent embodiments may include methods or devices. Accordingly, thevarious embodiments may take the form of entirely hardware or acombination of software and hardware aspects. The following detaileddescription is, therefore, not to be taken in a limiting sense.

Throughout the specification and claims, the following terms take themeanings explicitly associated herein, unless the context clearlydictates otherwise. The phrase “in one embodiment” as used herein doesnot necessarily refer to the same embodiment, though it may.Furthermore, the phrase “in another embodiment” as used herein does notnecessarily refer to a different embodiment, although it may. Thus, asdescribed below, various embodiments of the invention may be readilycombined, without departing from the scope or spirit of the invention.

In addition, as used herein, the term “or” is an inclusive “or”operator, and is equivalent to the term “and/or,” unless the contextclearly dictates otherwise. The term “based on” is not exclusive andallows for being based on additional factors not described, unless thecontext clearly dictates otherwise. In addition, throughout thespecification, the meaning of “a,” “an,” and “the” include pluralreferences. The meaning of “in” includes “in” and “on.”

The following briefly provides a simplified summary of the subjectinnovations in order to provide a basic understanding of some aspects.This brief description is not intended as an extensive overview. It isnot intended to identify key or critical elements, or to delineate orotherwise narrow the scope. Its purpose is merely to present someconcepts in a simplified form as a prelude to the more detaileddescription that is presented later.

Briefly stated, subject innovations are directed towards managing databackup and access within a cluster environment having a plurality ofmember indexers using storage redundancy (or replication). Althoughdescribed in more detail below, an indexer is any computing devicearranged to store and retrieve data. A member indexer may be selected asa primary source for data storage by a forwarder device that is externalto the cluster, where briefly, a forwarder device is any computingdevice configured to produce, discover, or collate data and then sendthe data to an indexer.

In one embodiment, the data may be stored in what is herein termed“buckets.” In any event, the forwarder device may, in one embodiment,operate within a client device. In one embodiment, the forwarder devicemay employ a load balancing algorithm to select the member indexer towhich to send data. In one embodiment, each forwarder device within aplurality of forwarder devices may select an indexer independent ofanother forwarder device. The forwarder device may, in one embodiment,specify a number of copies or a number of times the data is to bereplicated for data recovery, herein termed a “replication factor.”However, in other embodiments, the replication factor may be configuredon an indexer based on a variety of factors, including, a type of datato be replicated. In any event, the selected indexer may generate ajournal about the received data, including metadata, usable to rebuildthe data. The journal and data may then be sent to one or more othermember indexers for replication. In one embodiment, the selected indexermay be designated as a primary member indexer for the received data. Inone embodiment, a cluster master may identify a number of other memberindexers that may be secondary indexers for the data based, in part, onavailability to receive and to save replicates of the data. The numberof secondary indexers selected corresponds to the number of desiredreplications indicated by the forwarder device. In one embodiment, thesecondary indexers may also be called “shadow” indexers. Moreover, in atleast one embodiment, the data being saved may be termed a “slice ofdata.” Further, a member indexer may be designated primary (orsecondary) for one bucket of data, and secondary (or primary) for adifferent bucket of different data. Thus, in one embodiment, a bucket onan indexer may be considered primary for some requests, such as forexample, searches originating from the east coast, while beingdesignated as secondary for other requests, such as searches originatingfrom the west coast, or the like.

The primary indexer provides an Acknowledgement (ACK) message to theforwarder device to indicate that the received data has been receivedand saved. ACK messages may be received by the primary indexer from thesecondary indexers to indicate that they respectively received and savedthe data for replication. Failure to return an ACK to the forwarder wheninitially sending data may result in the forwarder device selecting adifferent indexer. Failure to return an ACK by a secondary indexer tothe primary may result in either resending of the data to the secondary,or a selection of another secondary, as appropriate.

Transitions of a primary indexer for a given bucket, either due to aplanned or an unplanned event, is managed using a generation identifier(GEN_ID) that indicates a particular time based set of the data. Thus,an indexer may be designated as primary for a given bucket for GEN_IDof, say zero, but a different indexer may be primary for the bucket at adifferent GEN_ID of, say one. Further, a bucket on an indexer may beprimary for different types of searches at different GEN_IDs. The GEN_IDthen may be a point at which these “primacy’ decisions can be changed.Thus, as an aside, for example, at GEN_ID equal to say 100, bucket A onindexer 1 may be primary—for everything, and bucket A on indexer 2 issecondary. Then, at GEN_ID equal to 150, these roles may change, wherebucket A on indexer 1 is primary for east coast searches, and secondaryfor west coast searches, while indexer 2 is then the opposite: primaryfor west coast searches and secondary for east coast searches. Further,at GEN_ID of 200, this may change again, where bucket A on indexer 1 issecondary for all searches, and indexer 2's bucket A is primary for allsearches. Thus, at every generation, every bucket's primacy settings maybe set so every search will visit each bucket once. For example, thisinvariant might be violated if bucket A were simultaneously primary forall searches on indexer 1 and primary for east coast searches on indexer2. Then a search coming from the east coast would get data from bucket Atwice. This is prevented however, by monitoring the GEN-ID, when theGEN_IDs new primacy rules can be in effect.

In any event, in one embodiment, GEN_D may be a monotonically increasingcounter. Moreover, in one embodiment, GEN_ID may be global across allbuckets of stored data by the cluster.

In one embodiment, search requests, retrieval of stored data, or any ofa variety of data recovery requests may be initiated by a searcherdevice, sometimes referred to as a “search head.” In one embodiment, thesearcher device may reside within a client device, or be another deviceaccessed by the client device. In one embodiment, the searcher devicemay be external from the cluster. In any event, a request may includeinformation, including a GEN_ID, indicating which generation of data isbeing requested. In one embodiment, the GEN_ID may be obtained by thesearcher device from the master device.

In one embodiment, the request may be broadcast to all of the memberindexers. However, the request may be ignored by secondary indexers forthat GEN_ID. The primary indexer for that GEN_ID responds to the requestfor the data. Moreover, because secondary indexers do not respond to therequest, a search request does not receive multiple responses of thesame data. Should the primary indexer fail, or otherwise fail to respondto the request, in one embodiment, a master device may designate one ofthe secondary indexers as a primary that may then respond to therequest. In one embodiment, the new primary indexer may also bedesignated as the primary indexer for storing of data associated with anincremented GEN_ID. Subsequent requests for data may then employ the newGEN_ID.

In one embodiment, member indexers may store different ‘slices’ of datahaving different GEN_IDs. Moreover, as disclosed further below, thecluster environment with member indexers may be minimally impacted by amaster failure. That is, because the forwarder device selects the memberindexer to receive data for storage and replication distribution, theforwarder device need not interact with the master device. Indexers mayoperate to receive and/or distribute for replication and storage, data,journals, or the like, independent of the master device. Further,although the master device manages GEN_IDs for data, requests for datamay employ earlier GEN_ID numbers and therefore are also minimallyimpacted by a master device failure. Although a master device may beaware of which indexer is primary or secondary for each bucket of databeing managed in the system, the master device does not manage the dataor the state of the data per se, as the state of the data is also knownby each of the member indexers in the aggregate. Therefore, if themaster device is unavailable for some amount of time, availability ofdata reception, indexing, and searching are not immediately impacted.However, in one embodiment, the master device may manage not only theGEN_ID transition updates, but may also re-assign, as appropriate,replication indexers when a member indexer fails.

Illustrative Operating Environment

FIG. 1 shows components of one embodiment of an environment in which theinvention may be practiced. Not all the components may be required topractice the invention, and variations in the arrangement and type ofthe components may be made without departing from the spirit or scope ofthe invention. As shown, system 100 of FIG. 1 includes local areanetworks (“LANs”)/wide area networks (“WANs”)—(network) 108, wirelessnetwork 107, client devices 101-106, searcher device 110, and cluster120. Cluster 120 includes a plurality of cluster member indexers121-123, and master device 126.

One embodiment of client devices 101-106 is described in more detailbelow in conjunction with FIG. 2. In one embodiment, at least some ofclient devices 101-106 may operate over a wired and/or a wirelessnetwork such networks 107 and 108. As shown, client device 101 mayinclude virtually any computing device capable of communicating over anetwork to send and receive information, including instant messages,performing various online activities, or the like. The set of suchdevices may include devices that typically connect using a wired orwireless communications medium such as personal computers,multiprocessor systems, microprocessor-based or programmable consumerelectronics, network PCs, or the like. Also, client device 102 mayinclude virtually any device usable as a video display device, such as atelevision, display monitor, display screen, projected screen, and thelike. Additionally, client device 106 may include any kind of ConsumerElectronic device, e.g., a Blu-ray player, DVD player, CD player,portable music playing device, portable display projector, and the like.Moreover, client devices 101-106 may provide access to various computingapplications, including a browser, or other web-based application.

Generally, however, client devices 101-106 may include virtually anyportable computing device capable of receiving and sending messages overa network, accessing and/or playing content, such as network 108,wireless network 107, or the like. Further, client devices 103-105 mayinclude virtually any portable computing device capable of connecting toanother computing device and receiving information such as, laptopcomputer 103, smart phone 104, and tablet computers 105, and the like.However, portable computer devices are not so limited and may alsoinclude other portable devices such as cellular telephones, displaypagers, radio frequency (“RF”) devices, infrared (“IR”) devices,Personal Digital Assistants (“PDAs”), handheld computers, wearablecomputers, integrated devices combining one or more of the precedingdevices, and the like. As such, client devices 101-106 typically rangewidely in terms of capabilities and features.

A web-enabled client device may include a browser application that isconfigured to receive and to send web pages, web-based messages, and thelike. The browser application may be configured to receive and displaygraphics, text, multimedia, media content, and the like, employingvirtually any internet based and/or network-based protocol, includingbut not limited to a wireless application protocol messages (“WAP”),Hypertext Transfer Protocol (“HTTP”), or the like. In one embodiment,the browser application is enabled to employ Handheld Device MarkupLanguage (“HDML”), Wireless Markup Language (“WML”), WMLScript,JavaScript, Standard Generalized Markup Language (“SGML”), HyperTextMarkup Language (“HTML”), eXtensible Markup Language (“XML”), and thelike, to display and send a message. In one embodiment, a user of aclient device may employ the browser application to perform variousactivities over a network (online). However, another application mayalso be used to perform various online activities.

Client devices 101-106 also may include at least one other clientapplication that is configured to receive and/or send content betweenanother computing device. The client application may include acapability to send and/or receive content, or the like. The clientapplication may further provide information that identifies itself,including a type, capability, name, and the like. In one embodiment,client devices 101-106 may identify themselves as part of a class ofdevices. In another embodiment, client devices 101-106 may uniquelyidentify themselves through any of a variety of mechanisms, including aphone number, Mobile Identification Number (“MIN”), an electronic serialnumber (“ESN”), Internet Protocol (IP) Address, network address, orother mobile device identifier. The information may also indicate acontent format that the mobile device is enabled to employ. Suchinformation may be provided in a network packet, or the like, sentbetween other client devices, searcher device 110, and/or any one ormore of member indexers 121-123, master device 126, or other computingdevices. Moreover, it should be readily understood that devices and/orcomponents within a device that is communicating with a client devicemay also identify themselves using any of a variety of mechanisms,including those used by the client device.

Client devices 101-106 may further be configured to include a clientapplication that enables an end-user to log into an end-user accountthat may be managed by another computing device. Such end-user account,in one non-limiting example, may be configured to enable the end-user tomanage one or more online activities, including in one non-limitingexample, search activities, social networking activities, browse variouswebsites, communicate with other users, or the like.

Moreover, in one embodiment, client devices 101-106 may include anapplication described further below as a forwarder. In one embodiment,the forwarder application may enable a client device to operate as aforwarder device to provide data to one or more members of clusterbackup. As described below, the forwarder device may select an indexerwithin cluster 120 based on any of a variety of mechanisms, to receivethe data for backup and to manage replication of the data based on aprovided replication factor. Forwarder devices may make the selectionbased on a load-balancing algorithm, including a least loaded algorithm,a fastest response algorithm, a round-robin, random selection, or any ofa variety of other mechanisms. Should the selected member indexer failto provide an acknowledgement in response to receiving data from theforwarder device for storage, the forwarder device may elect to select adifferent indexer to manage the data storage and replication. In oneembodiment, the forwarder device may receive information about theavailable member indexers 121-123 from master device 126.

Client devices 101-106 may also interact with searcher device 110 forrecovery of data, to perform search queries on stored data, or any of avariety of other queries of data stored and replicated by cluster 120.

Thus, searcher device 110 may be virtually any network device that isconfigured to perform search, recovery, or other operations upon thedata managed by cluster 120. In one embodiment, searcher device 110 mayobtain information from master device 126 indicating a current GEN_IDfor data for which searcher device 110 is to request. In someembodiments, the searcher device 110 may also receive a list of indexersto contact. Searcher device 110 may then send a request for such data tocluster 120 that may include requests for multiple sets of data and theGEN_ID from which to obtain the data. Searcher device 110 receives froma designated primary member indexer within cluster 120 for the requesteddata and GEN_ID. In one embodiment, searcher device 110 may performvarious actions on the received data, and/or provide the data to one ormore client devices.

Wireless network 107 is configured to couple client devices 103-105 andits components with network 108. Wireless network 107 may include any ofa variety of wireless sub-networks that may further overlay stand-alonead-hoc networks, and the like, to provide an infrastructure-orientedconnection for client devices 101-106. Such sub-networks may includemesh networks, Wireless LAN (“WLAN”) networks, cellular networks, andthe like. In one embodiment, the system may include more than onewireless network.

Wireless network 107 may further include an autonomous system ofterminals, gateways, routers, and the like connected by wireless radiolinks, and the like. These connectors may be configured to move freelyand randomly and organize themselves arbitrarily, such that the topologyof wireless network 107 may change rapidly.

Wireless network 107 may further employ a plurality of accesstechnologies including 2nd (2G), 3rd (3G), 4th (4G) generation radioaccess for cellular systems, WLAN, Wireless Router (“WR”) mesh, and thelike. Access technologies such as 2G, 3G, 4G and future access networksmay enable wide area coverage for mobile devices, such as client devices101-106 with various degrees of mobility. In one non-limiting example,wireless network 107 may enable a radio connection through a radionetwork access such as Global System for Mobil communication (“GSM”),General Packet Radio Services (“GPRS”), Enhanced Data GSM Environment(“EDGE”), Wideband Code Division Multiple Access (“WCDMA”), and thelike. In essence, wireless network 107 may include virtually anywireless communication mechanism by which information may travel betweenclient devices 103-106 and another computing device, network, and thelike.

Network 108 is configured to couple network devices with other computingdevices, including, cluster 120, and searcher device 110, and throughwireless network 107 to client devices 103-105. Network 108 is enabledto employ any form of network mechanism for communicating informationfrom one electronic device to another. Also, network 108 can include theInternet in addition to LANs, WANs, direct connections, such as througha universal serial bus (“USB”) port, other forms of network mechanism,or any combination thereof. On an interconnected set of LANs, includingthose based on differing architectures and protocols, a router acts as alink between LANs, enabling messages to be sent from one to another. Inaddition, communication links within LANs typically include twisted wirepair or coaxial cable, while communication links between networks mayutilize analog telephone lines, full or fractional dedicated digitallines including T1, T2, T3, and T4, and/or other carrier mechanismsincluding, for example, E-carriers, Integrated Services Digital Networks(“ISDNs”), Digital Subscriber Lines (“DSLs”), wireless links includingsatellite links, or other communications links known to those skilled inthe art. Moreover, communication links may further employ any of avariety of digital signaling technologies, including without limit, forexample, DS-0, DS-1, DS-2, DS-3, DS-4, OC-3, OC-12, OC-48, or the like.Furthermore, remote computers and other related electronic devices couldbe remotely connected to either LANs or WANs via a modem and temporarytelephone link. In one embodiment, network 108 may be configured totransport information of an Internet Protocol (“IP”). In essence,network 108 includes any communication method by which information maytravel between computing devices.

Additionally, network mechanisms by way of example, network mechanismsinclude wired media such as twisted pair, coaxial cable, fiber optics,wave guides, and other wired media and wireless media such as acoustic,RF, infrared, and other wireless media.

Cluster 120 typically is configured to include loosely coupled networkdevices that may cooperate to provide another device with access to aservice, resource, and the like. In one embodiment, cluster 120 isconfigured to manage data storage and replication of data received byone or more client devices operating as forwarder devices.

One embodiment, of a cluster member indexer is disclosed in more detailbelow in conjunction with FIG. 3. Briefly, however, cluster 120 includesa plurality of member indexers 121-123, and master device 126. Whilemaster device 126 is illustrated as a separate device from memberindexers 121-123, it should be understood that other embodiments are notso constrained. For example, one of member indexers 121-123 may also beconfigured to operate and perform functions of master device 126 as wellas operating to perform functions of an indexer device. Further, shouldmaster device 126 fail or otherwise be determined to be non-responsive,any one or more of member indexers 121-123 may elect one of the memberindexers 121-123 to become a master device. Such election may beperformed using any mechanism, including a priority selection, a leastloaded selection, a random selection, or the like.

Further, each of member indexers 121-123 and master device 126 areconfigured to communicate with each other to send messages, determine astatus of another device within cluster 120, respond to requests forstatus or other information, or the like. For example, based on a timer,or other algorithm member indexers 121-123 and master device 126 maysend out a ping or other status request to one or more of the otherdevices within cluster 126. As used herein, a ping may represent anymessage sent out this is arranged to expect to receive a response. Inthis manner, a failure to respond to the status request (or ping) mayindicate that the device not responding has failed. Then based on whichdevice is determined to have failed, master device 126 or one of themember indexers 121-123 may assume the functions of the failed device.

In one embodiment, master device 126 may be any network device that isconfigured to monitor status of the member indexers 121-123, assignGEN_IDs, and indicate available indexers useable as secondary indexersfor data storage/replication. Master device 126 may also provideinformation to client devices 101-106 and/or searcher device 110,including information about indexers available for storing data, GEN_IDsand updates to GEN_IDs, and the like. Master device 126 may alsocoordinate planned and unplanned transitions of indexers from secondaryindexer status to primary indexer status for a given bucket or bucketsof data. While master device 126 may maintain status of which indexersare primary, secondary, is storing what data for a given GEN_ID, masterdevice 126 need not touch or otherwise manage the data or journals aboutthe data. Master device 126 may perform other actions as describedherein.

Member indexers 121-123 represent elements of the described embodimentsthat may index and store data and events, and provide replication of thedata and events. Indexers 121-123 may collect, parse, and store data tofacilitate fast and accurate information retrieval. Index design forstorage may incorporate interdisciplinary concepts from linguistics,cognitive psychology, mathematics, informatics, physics, and computerscience. Also, indexes may reside in flat files in a data store on afile system. Index files may be managed to facilitate flexible searchingand fast data retrieval, eventually archiving them according to aconfigurable schedule, request, or the like. During indexing, incomingraw data from, for example, a forwarder device, may be processed toenable fast search and analysis, the results of which may be stored inan index, or bucket. As part of the indexing process, the indexer121-123 may add knowledge to the data in various ways, including by:separating a data stream into individual, searchable events; creating oridentifying timestamps; extracting fields such as host, source, andsource type; performing user-defined actions on the incoming data, suchas identifying custom fields, masking sensitive data, writing new ormodified keys, applying breaking rules for multi-line events, filteringunwanted events, and routing events to specified indexes or servers, andthe like. In one embodiment, indexers 121-123 may also generate journalsof the received raw data that provides meta-data about the raw data,and/or other information that may be useable for building and/orotherwise regenerating portions of the raw data or information about theraw data.

Indexers 121-123 may be selected as a primary indexer for at least aportion of the received data and manage that data in buckets asdescribed further below in conjunction with FIG. 4. A designated primaryindexer may provide acknowledgements to a forwarder device indicatingthat the received data has been received and/or has been stored in abucket. A designated primary indexer may further generate a journalabout the received data. In one embodiment, the designated primaryindexer may then send the data and journal to one or more other indexersto be replicated. The primary indexer may further resend the data orsend the data to a different indexer for replication based on whether ornot an acknowledgement is received from the secondary indexer(s).

While indexers 121-123 are illustrated within cluster 120, as residingon different network devices, other embodiments are not so constrained.In another embodiment, each indexer may reside with a blade serverarchitecture. Generally, a blade server is a stripped down servercomputing device with a modular design optimized to minimize a use ofphysical space and energy. A blade enclosure can include several bladeservers and provide each with power, cooling, network interfaces,input/output interfaces, and resource management. A plurality of bladeservers may also be included in one enclosure that shares resourcesprovided by the enclosure to reduce size, power, and/or cost.

Moreover, indexers 121-123 need not be physically collocated. Thus, forexample, indexer 121 may reside in California, while indexers 122-123might reside on the east coast of America, or the like. Clearly, othervariations are also envisaged. Moreover, while three indexers areillustrated, cluster 120 may include many more or less indexers thanillustrated.

Illustrative Client Device

FIG. 2 shows one embodiment of client device 200 that may be included ina system implementing the invention. Client device 200 may represent anyof a variety of platforms useable to perform actions as disclosedwithin. Client device 200 may include many more or less components thanthose shown in FIG. 2. However, the components shown are sufficient todisclose an illustrative embodiment for practicing the presentinvention. Client device 200 may represent, for example, one embodimentof at least one of client devices 101-106 of FIG. 1.

As shown in the figure, client device 200 includes a central processingunit (“CPU”) 202 in communication with a mass memory 226 via a bus 234.Client device 200 also includes a power supply 228, one or more networkinterfaces 236, an audio interface 238, a display 240, a keypad 242, anilluminator 244, a video interface 246, an input/output interface 248, ahaptic interface 250, and a global positioning systems (“GPS”) receiver232.

Power supply 228 provides power to client device 200. A rechargeable ornon-rechargeable battery may be used to provide power. The power mayalso be provided by an external power source, such as an alternatingcurrent (“AC”) adapter or a powered docking cradle that supplementsand/or recharges a battery.

Client device 200 may optionally communicate with a base station (notshown), or directly with another computing device. Network interface 236includes circuitry for coupling client device 200 to one or morenetworks, and is constructed for use with one or more communicationprotocols and technologies including, but not limited to, GSM, codedivision multiple access (“CDMA”), time division multiple access(“TDMA”), user datagram protocol (“UDP”), transmission controlprotocol/Internet protocol (“TCP/IP”), Short Message Service (“SMS”),GPRS, WAP, ultra wide band (“UWB”), Institute of Electrical andElectronics Engineers (“IEEE”) 802.16 Worldwide Interoperability forMicrowave Access (“WiMax”), session initiated protocol/real-timetransport protocol (“SIP/RTP”), or any of a variety of other wiredand/or wireless communication protocols. Network interface 236 issometimes known as a transceiver, transceiving device, or networkinterface card (“NIC”).

Audio interface 238 is arranged to produce and receive audio signalssuch as the sound of a human voice. For example, audio interface 238 maybe coupled to a speaker and microphone (not shown) to enabletelecommunication with others and/or generate an audio acknowledgementfor some action.

Display 240 may be an LCD, gas plasma, light emitting diode (“LED”), orany other type of display used with a computing device. Display 240 mayalso include a touch sensitive screen arranged to receive input from anobject such as a stylus or a digit from a human hand.

Moreover, display 240 may be configured to employ any of a variety ofnetwork connection types, including, but not limited to High-BandwidthDigital Content Protection (HDCP) connection types, Display Port (DP),Digital Visual Interface (DVI), and High-Definition Multimedia Interface(HDMI), as well as Gigabit Video Interface (GVIF), Standard-definition(SD), Unified Display Interface (UDI). At least some of these networkconnection types provide a form of digital copy protection. A detectionof whether display 240 is connected through one of these, or othertypes, of network connection types may be determined using a variety oftechniques, including signature transmissions, protocol handshakes,authentication procedures, or the like. Changing usage of a networkconnection type may indicate a change in a level of trust of at leastone component of client device 200.

Keypad 242 may comprise any input device arranged to receive input froma user. For example, keypad 242 may include a push button numeric dial,or a keyboard. Keypad 242 may also include command buttons that areassociated with selecting and sending images.

Illuminator 244 may provide a status indication and/or provide light.Illuminator 244 may remain active for specific periods of time or inresponse to events. For example, when illuminator 244 is active, it maybacklight the buttons on keypad 242 and stay on while the client deviceis powered. Also, illuminator 244 may backlight these buttons in variouspatterns when particular actions are performed, such as dialing anotherclient device. Illuminator 244 may also cause light sources positionedwithin a transparent or translucent case of the client device toilluminate in response to actions.

Video interface 246 is arranged to capture video images, such as a stillphoto, a video segment, an infrared video, or the like. For example,video interface 246 may be coupled to a digital video camera, aweb-camera, or the like. Video interface 246 may comprise a lens, animage sensor, and other electronics. Image sensors may include acomplementary metal-oxide-semiconductor (“CMOS”) integrated circuit,charge-coupled device (“CCD”), or any other integrated circuit forsensing light.

Client device 200 also comprises input/output interface 248 forcommunicating with external devices, such as a headset, or other inputor output devices not shown in FIG. 2. Input/output interface 248 canutilize one or more communication technologies, such as USB, infrared,Bluetooth™, or the like. Haptic interface 250 is arranged to providetactile feedback to a user of the client device. For example, the hapticinterface 250 may be employed to vibrate client device 200 in aparticular way when another user of a computing device is calling.

OPS transceiver 232 can determine the physical coordinates of clientdevice 200 on the surface of the Earth. GPS transceiver 232, in someembodiments, may be optional. GPS transceiver 232 typically outputs alocation as latitude and longitude values. However, GPS transceiver 232can also employ other geo-positioning mechanisms, including, but notlimited to, triangulation, assisted GPS (“AGPS”), Enhanced Observed TimeDifference (“E-OTD”), Cell Identifier (“CI”), Service Area Identifier(“SAI”), Enhanced Timing Advance (“ETA”), Base Station Subsystem(“BSS”), or the like, to further determine the physical location ofclient device 200 on the surface of the Earth. It is understood thatunder different conditions, GPS transceiver 232 can determine a physicallocation within millimeters for client device 200; and in other cases,the determined physical location may be less precise, such as within ameter or significantly greater distances. In one embodiment, however,mobile device 200 may through other components, provide otherinformation that may be employed to determine a physical location of thedevice, including for example, a Media Access Control (“MAC”) address,IP address, or the like.

Mass memory 226 includes a Random Access Memory (“RAM”) 204, a Read-onlyMemory (“ROM”) 222, and other storage means. Mass memory 226 illustratesan example of computer readable storage media (devices) for storage ofinformation such as computer readable instructions, data structures,program modules or other data. Mass memory 226 stores a basicinput/output system (“BIOS”) 224 for controlling low-level operation ofclient device 200. The mass memory also stores an operating system 206for controlling the operation of client device 200. It will beappreciated that this component may include a general-purpose operatingsystem such as a version of UNIX, or LINUX™, or a specialized clientcommunication operating system such as Windows Mobile™, or the Symbian®operating system. The operating system may include, or interface with aJava virtual machine module that enables control of hardware componentsand/or operating system operations via Java application programs.

Mass memory 226 further includes one or more data storage 208, which canbe utilized by client device 200 to store, among other things,applications 214 and/or other data. For example, data storage 208 mayalso be employed to store information that describes variouscapabilities of client device 200. The information may then be providedto another device based on any of a variety of events, including beingsent as part of a header during a communication, sent upon request, orthe like. Data storage 208 may also be employed to store various rawdata to be sent to cluster 120 for storage. At least a portion of theinformation may also be stored on another component of network device200, including, but not limited to computer readable storage device 230,a disk drive or other computer-readable storage device (not shown)within client device 200.

Applications 214 may include computer executable instructions which,when executed by client device 200, transmit, receive, and/or otherwiseprocess messages (e.g., SMS, Multimedia Message Service (“MMS”), instantmessages (“IM”), email, and/or other messages), audio, video, and enabletelecommunication with another user of another client device. Otherexamples of application programs include calendars, search programs,email clients, IM applications, SMS applications, voice over InternetProtocol (“VOIP”) applications, contact managers, task managers,transcoders, database programs, word processing programs, securityapplications, spreadsheet programs, games, search programs, and soforth. Applications 214 may include, for example, browser 218, searcher271, and forwarder 261.

Browser 218 may include virtually any application configured to receiveand display graphics, text, multimedia, and the like, employingvirtually any web based protocol. In one embodiment, the browserapplication is enabled to employ HDML, WML, WMLScript, JavaScript, SGML,HTML, XML, and the like, to display and send a message. However, any ofa variety of other web-based programming languages may be employed. Inone embodiment, browser 218 may enable a user of client device 200 toprovide and/or receive data from another computing device, such ascluster 120, and/or searcher device 110 of FIG. 1.

Forwarder 261 may enable client device 200 to operate as a forwarderdevice to prepare and send data about various actions of client device200, to cluster 120 of FIG. 1 for storage. Forwarder 261 may collect thedata, in real-time, and/or non-real-time. Forwarder 261 may query masterdevice 126 for information about available indexers within cluster 120that may be available for storage of data. In one embodiment, forwarder261 may receive address information about the indexers to enableforwarder 261 to send data.

Forwarder 261 may select an indexer to receive the data based on any ofa variety of criteria, including, but not limited to a load-balancingalgorithm. In one embodiment, selection of an indexer may result in thatindexer becoming a primary indexer for that data. In one embodimentshould forwarder 261 not receive an acknowledgement from at least theprimary indexer in response to sending data, forwarder 261 may select adifferent indexer in which to send data.

In one embodiment, forwarder 261 may specify an amount of space forstoring of the data, and/or may specify a replication factor indicatinga number of times the data is to be replicated. However, in otherembodiments, the replication factor might be configured on an indexerbased on, for example, a type of data being replicated. Forwarder 261may also indicate when to close a bucket for data, open a new bucket, orthe like.

Searcher 271 may perform functions described above for searcher device110 of FIG. 1. Thus, in one embodiment, searcher 271 may operate localto client device 200 to enable client device 200 to perform searches,data recovery, or other actions upon the data provided to indexersthrough at least forwarder 261. In one embodiment, searcher 271 mayobtain status information from master device 126 including a GEN_ID fordata of interest by searcher 271. Searcher 271 may then send a requestfor the data to cluster 120 along with a GEN_ID. Searcher 271 thenreceives a response that from one or more primary indexers associatedwith the requested data and GEN_ID.

Illustrative Network Device

FIG. 3 shows one embodiment of a network device 300, according to oneembodiment of the invention. Network device 300 may include many more orless components than those shown. The components shown, however, aresufficient to disclose an illustrative embodiment for practicing theinnovations. Network device 300 may be configured to operate as aserver, a client, a peer, a host, or any other device. Network device300 may represent, for example member indexers 121-123 of FIG. 1.

Network device 300 includes central processing unit 302, computerreadable storage device 328, network interface unit 330, an input/outputinterface 332, hard disk drive 334, video display adapter 336, and amass memory, all in communication with each other via bus 326. The massmemory generally includes RAM 304, ROM 322 and one or more permanent(non-transitory) mass storage devices, such as hard disk drive 334, tapedrive, optical drive, and/or floppy disk drive. The mass memory storesoperating system 306 for controlling the operation of network device300. Any general-purpose operating system may be employed. BIOS 324 isalso provided for controlling the low-level operation of network device300. As illustrated in FIG. 3, network device 300 also can communicatewith the Internet, or some other communications network, via networkinterface unit 330, which is constructed for use with variouscommunication protocols including the TCP/IP protocol. Network interfaceunit 330 is sometimes known as a transceiver, transceiving device, ornetwork interface card (NIC).

Network device 300 also comprises input/output interface 332 forcommunicating with external devices, such as a keyboard, or other inputor output devices not shown in FIG. 3. Input/output interface 332 canutilize one or more communication technologies, such as USB, infrared,Bluetooth™, or the like.

The mass memory as described above illustrates another type ofcomputer-readable media, namely computer-readable storage media and/orprocessor-readable storage media. Computer-readable storage media(devices) may include volatile, nonvolatile, removable, andnon-removable media implemented in any method or technology for storageof information, such as computer readable instructions, data structures,program modules, or other data. Examples of computer readable storagemedia include RAM, ROM, Electrically Erasable Programmable Read-onlyMemory (“EEPROM”), flash memory or other memory technology, Compact DiscRead-only Memory (“CD-ROM”), digital versatile disks (“DVD”) or otheroptical storage, magnetic cassettes, magnetic tape, magnetic diskstorage or other magnetic storage devices, or any other physical mediawhich can be used to store the desired information and which can beaccessed by a computing device.

As shown, data storage 308 may include a database, text, spreadsheet,folder, file, or the like, that may be configured to maintain and storeuser account identifiers, user profiles, email addresses, IM addresses,and/or other network addresses, or the like. Data storage 308 mayfurther include program code, data, algorithms, and the like, for use bya processor, such as central processing unit 302 to execute and performactions. In one embodiment, at least some of data storage 308 might alsobe stored on another component of network device 300, including, but notlimited to computer readable storage device 328, hard disk drive 334, orthe like.

Data storage 308 may further store indexed data 310. Indexed data 310 isdescribed below in more detail in conjunction with FIG. 4. Briefly,however, indexed data 310 may be arranged in a variety of buckets usableto store and/or otherwise manage data from a forwarder device, and/orother data and meta-data, including journals, about the data.

The mass memory also stores program code and data. One or moreapplications 314 are loaded into mass memory and run on operating system306. Examples of application programs may include transcoders,schedulers, calendars, database programs, word processing programs,Hypertext Transfer Protocol (“HTTP”) programs, customizable userinterface programs, IPSec applications, encryption programs, securityprograms, SMS message servers, IM message servers, email servers,account managers, and so forth. Web server 318, searcher 319, indexmanager 320, and master component 323 may also be included.

Briefly, master component 323 may be configured to enable network device300 to operate as a master device performing actions as describedherein. In one embodiment, master component 323 may also be configuredto manage status communications with other indexers, election of areplacement master device, should a current master device for thecluster fail, or the like.

Index manager 320 may perform a variety of actions as disclosed hereinto enable network device 300 to operate as an indexer within thecluster. Thus, index manager 320 may manage collection of data,generation of journals about the data, storage of the data, generationsof buckets for storage of the data, closing and opening of buckets basedon a variety of criteria, and provide acknowledgements in response toreceiving data for storage. Index manager 320 may also receiveinformation indicating that network device 300 is a primary indexer,secondary indexer, or the like, for particular one or more buckets ofdata, receive and manage data based on a GEN-ID, or the like.

Index manager 320 may also perform actions to send the data to anotherindexer to be replicated, operate to reconfigure network device 300 froma secondary indexer to a primary indexer or the reverse, and/orotherwise manage the data based on a GEN_ID.

Searcher 319 may be arranged to respond to queries about the data storedand/or otherwise managed by network device 300. Searcher 319 may receivea request for data, determine whether network device 300 is a primaryindexer for the data based on the data requested and GEN_ID, and inresponse, provide the data. Searcher 319 may determine that networkdevice 300 is secondary for the requested data and GEN_ID and select toignore the request for the data.

Web server 318 represent any of a variety of services that areconfigured to provide content, including messages, over a network toanother computing device. Thus, web server 318 includes, for example, aweb server, an FTP server, a database server, a content server, or thelike. Web server 318 may provide the content including messages over thenetwork using any of a variety of formats including, but not limited toWAP, HDML, WML, SGML, HTML, XML, Compact HTML (“cHTML”), Extensible HTML(“xHTML”), or the like. In one embodiment, web server 318 may provide aninterface usable by a client device to searcher 319, index manager 323,indexed data 310, or the like.

General Operation

The operation of certain aspects of various embodiments will now bedescribed with respect to FIGS. 4-7. FIG. 4 illustrates onenon-limiting, non-exhaustive example of managing redundant data backupand recovery across a plurality of member indexers within a cluster.

As shown, architecture 400 of FIG. 4 illustrated a plurality of buckets410 managed by indexers 121-123. Briefly, a bucket represents amechanism usable for storing and/or otherwise managed data and eventsreceived from a forwarder device. As shown, buckets 410 may bedesignated as hot 411, warm 412, cold 413 buckets, to indicate whether abucket is open to receive and store data (hot), recently closed forreceiving data but available for access of data (warm), or closed andmay be unavailable readily for access of data (cold). In one embodiment,based on any of a variety of criteria, buckets may have an expirationpolicy, indicating when a bucket is to be closed, moved from hot to warmto cold, to even removed from the system of buckets 410.

Architecture 400 illustrates that an indexer may have a plurality ofbuckets in which to manage data. Moreover, as shown, one indexer may bea primary indexer for some data, and a secondary indexer for other data.See for example, indexer 121 is primary for A-data 421, and secondaryfor B-data 431, while indexer 122 is primary for B-data 432 andsecondary for A-data 422. Also shown, is that based on the replicationfactor, there may be more than one secondary indexer, such as indexer123 (for B-data 433).

FIG. 5 illustrates one embodiment of a signal flow usable to manageredundant (replication) of data across multiple member indexers within acluster. As shown in flow 500, indexer 121 and indexer 122 communicatewith each other, a master device, and/or forwarder device. Time isindicated as flowing downwards. Further, 1° and 2° indicate primary andsecondary, respectively. It should also be noted that while more or lesscommunications may occur between at least these devices, those shown aresufficient to disclose an illustrative embodiment for practicing theinnovations.

As shown, forwarder device, based on any of variety of criteria selectsindexer 121 and send data and a replication factor value to indexer 121to store the data, and to further forward the data to other indexers forreplication. In one embodiment, receiving of the data from a forwarderdevice may initiate opening or otherwise creation of a hot bucket forstorage of the data. In one embodiment, indexer 121 may receiveinformation from the master device indicating a GEN_ID for the bucket.

Indexer 121, assuming it is functional and active (has not failed), thenprepares and saves the data to the hot bucket. As noted above, in oneembodiment, the received data may be referred to as a “slice.” That is,in one embodiment, the slice may represent a small amount of data,typically of a few kilobytes in size—although other sizes may also bereceived. Further, indexer 121 may generate a journal or perform otheractions on the data as discussed above. The journal and results of theactions may also be saved within the bucket.

Indexer 121 may, in one embodiment, receive information from the masterdevice indicating other indexers, such as indexer 122, that is assignedor otherwise available to be a secondary indexer for the data. Inanother embodiment, indexer 121 may employ the replication factor andsend the data to the indicated number of other indexers, therebyindicating that they are to become secondary indexers for the data. Themaster device may then update information about which devices areprimary or secondary for the data.

In one embodiment, indexer 122 receives the data/journal from indexer121 and saves the data, operating as a secondary indexer. Assuming thatindexer 122 has not failed, operating as a secondary indexer, indexer122 provides an acknowledgment to indexer 121 that it has received andstored the data/journal. Should indexer 122 not provide anacknowledgement within a defined amount of time, indexer 121 may electto resend the data/journal and/or select a different indexer as asecondary indexer for the data. In one embodiment, because GEN_ID isglobal, each indexer may further store the data in a bucket having theGEN_ID as an identifier.

Independent of whether or not indexer 121 receives an acknowledgmentfrom indexer 122, when indexer 121 has received and stored the data,indexer 121 provides an acknowledgement to the forwarder device.

Shown is an example of what happens when indexer 121 fails. As shown,forwarder device may send data to indexer 121 for a new hot bucket.However, because the forwarder device does not receive anacknowledgement, the forwarder may seek to send the data to anotherindexer based on any of a variety of selection criteria. In thisinstance, again assuming that indexer 122 is active; it receives thedata, generates a journal, creates a hot bucket for the data, andreturns an acknowledgement to the forwarder device. Indexer 122 thenbecomes the primary indexer for this data.

FIG. 6 illustrates a flow chart of one embodiment of a process usable tomanage redundant (replication) of data across multiple member indexerswithin a cluster. Process 600 of FIG. 6 may be performed by an indexerwithin a cluster as discussed above in conjunction with FIG. 1.

Process 600 begins, after a start, wherein data is received. In oneembodiment, a replication factor might also be received. Processingflows to block 604 where, in one embodiment, communications of statusmay be performed between the master device, and/or other indexers. Inone embodiment, the communications may indicate an availability of theother indexers to operate as a secondary indexer, a GEN_ID, or the like.

Process 600 continues to block 606 where in one embodiment, the indexerreceiving the data from the forwarder device is designated as theprimary indexer for that data and GEN_ID. Continuing next to block 608,the journal is created and/or updated as discussed above. Flowing toblock 610, the data and journal are stored into a hot bucket having theGEN_ID. At block 612, the data/journal are forwarded to one or moreother indexers to operate as secondary indexers for the data having theGEN_ID. At block 614, an acknowledgement that the data has been receivedand/or stored by the primary indexer is sent to the forwarder device.

Flowing next to decision block 616 a determination is made whether anacknowledgement is received from each of the designated secondaryindexers. If not, then processing flows to decision block 618;otherwise, processing flows to block 626.

At decision block 618, a determination is made whether to select adifferent secondary indexer. This may be based, in part, on a licensingor service level agreement with the forwarder device, the replicationfactor, or the like. In any event, if another secondary indexer is to beselected, processing flows to block 620 where a different indexer isselect, and at block 622, the data/journal is sent to the selectedindexer. Processing then loops back to decision block 616. At block 624,the data/journal may be resent to the secondary indexer. Processingloops back to decision block 616.

At block 626, the primary indexer may receive another slice of data tobe stored in the hot bucket for the identified GEN_ID. Processing thencontinues back to block 612.

While data storage is one aspect of the innovations, another is managingof requests for the stored data. Thus, FIG. 7 illustrates non-limiting,non-exhaustive examples of managing a request for data during a memberindexer failure within a cluster. Time is illustrated as flowingdownwards.

As shown, flow 700 of FIG. 7 represents signal flows between a searcherdevice, master device, and indexers, such as discussed above inconjunction with FIG. 1. It should also be noted that while more or lesscommunications may occur between at least these devices, those shown aresufficient to disclose an illustrative embodiment for practicing theinnovations.

As shown, indexer 121 is primary indexer for the data under discussionat GEN_ID=59, while indexer 122 is secondary indexer for the data.

A request for the data is received from a searcher device. In oneembodiment, the searcher device provides the GEN_ID for the datarequested. As illustrated the request may be broadcast to each of theindexers within the cluster. However, because indexer 122 is not theprimary indexer, it ignores the request, and indexer 121 which isprimary for the data instead provides a response to the data request. Inthis manner, the same data is not provided more than once to the samerequest.

Now, consider that indexer 121 fails. When another request is received,in one embodiment, because there is no primary indexer, a response mightnot be provided. However, in one embodiment, the request for the datamight be delayed at least until indexer 122 is provided sufficient timeto reconfigure and become the new primary indexer for the data. In oneembodiment, the master device might initiate this action. However, inanother embodiment, the secondary indexers might also recognize that theprimary indexer has failed, and elect among themselves a new primaryindexer for the data. In one embodiment, the GEN_ID may also beincremented as illustrated in flow 700.

However, in still another embodiment, the searcher device may elect towait until a new primary indexer is available, or elect to proceed witha partial data response.

In any event, as shown above, while the master device may be aware ofwhich indexer is primary or secondary, and where data may be stored, aswell as managing GEN_IDs, the cluster may still operate should themaster device fail. That is because the data is still managed by theindexers, and thus, the indexers are not dependent upon the masterdevice for the data. As noted, election of the primary indexer (andpotentially even the secondary indexers) may be achieved without themaster device's intervention. Further, it is even possible that failureof a primary indexer may be resolved while the master device is failed.Additionally, searches and other data queries may use earlier GEN_IDvalues.

However, as noted because GEN_ID increments, there may be value in beingaware of a history of the GEN_ID, and which indexer was/is primary forthat GEN_ID value. Thus, in one embodiment, an indexer might be primaryfor one value of GEN_ID for some data, and be secondary or otherwiseunrelated to the data at a different GEN_ID. Thus, in one embodiment,search requests may provide multiple GEN_IDs, and receive responses fromdifferent primary indexers based on the different GEN_IDs.

For example, a first search request might request searches be performedby an indexer, using a first GEN_ID value, while a second search requestmight be requested to be performed by an indexer using a second,different, GEN_ID value. Further, while the requests might be receivedby multiple indexers, because each indexer knows whether or not they areprimary for different buckets at different GEN_ID values, each of therequests will receive responses that are complete and non-redundant setsof results even though which indexers search which buckets may change.

In one embodiment, a bit map may be employed to provide such multipleGEN_ID usages. For example, an index value may have bits of some range,such as 64 bits. Other ranges may also be used. In this example, someset of bits may indicate that searches for the data may occur on oneindexer, while setting of other bits indicate that the other portion ofsearches are to be performed on another, different indexer. This featuremay further be useable to indicate, for example, that for some of thedata, one indexer is primary, and for other portions of the data, adifferent indexer is primary. This may be useful for implementing, forexample, multi-geographic storage policies. For example, indexer 121might be primary to store data for west coast, and indexer 122 might beprimary to store data for east coast. Further, based on this type of bitmapping extension, different replication factors might also be usedbased on the geography, or the like.

In any event, it will be understood that each block of the flowchartillustration, and combinations of blocks in the flowchart illustration,can be implemented by computer program instructions. These programinstructions may be provided to a processor to produce a machine, suchthat the instructions, which execute on the processor, create means forimplementing the actions specified in the flowchart block or blocks. Thecomputer program instructions may be executed by a processor to cause aseries of operational steps to be performed by the processor to producea computer-implemented process such that the instructions, which executeon the processor to provide steps for implementing the actions specifiedin the flowchart block or blocks. The computer program instructions mayalso cause at least some of the operational steps shown in the blocks ofthe flowchart to be performed in parallel. Moreover, some of the stepsmay also be performed across more than one processor, such as mightarise in a multi-processor computer system, a cloud system, amulti-server system, or the like. In addition, one or more blocks orcombinations of blocks in the flowchart illustration may also beperformed concurrently with other blocks or combinations of blocks, oreven in a different sequence than illustrated without departing from thescope or spirit of the invention.

Accordingly, blocks of the flowchart illustration support combinationsof means for performing the specified actions, combinations of steps forperforming the specified actions and program instruction means forperforming the specified actions. It will also be understood that eachblock of the flowchart illustration, and combinations of blocks in theflowchart illustration, can be implemented by special purpose hardwarebased systems, which perform the specified actions or steps, orcombinations of special purpose hardware and computer instructions.

The above specification, examples, and data provide a completedescription of the manufacture and use of the composition of theinvention. Since many embodiments of the invention can be made withoutdeparting from the spirit and scope of the invention, the inventionresides in the claims hereinafter appended.

1. A system, comprising: a plurality of forwarder devices, eachforwarder device operating within a different client device having oneor more processors, wherein at least one forwarder device performsactions, including: selecting an indexer from within a plurality ofindexers; sending data to the selected indexer to be replicatedaccording to a defined replication factor; when an Acknowledgementmessage is not received from the selected indexer indicating that thedata is received and stored by the selected indexer, selecting adifferent indexer in which to send the data for replication; and acluster having the plurality of indexers, each indexer having one ormore processors that perform actions, including: receiving, by theselected indexer, the data and defined replication factor, the selectedindexer being designated as a primary indexer for the received data at aspecified generation identifier (GEN_ID) for the data; sending, by theselected indexer, to a number of other indexers the data and a journalincluding metadata useable to recreate the data, the number of otherindexers being defined by the replication factor and each of the otherindexers being designated as secondary indexers for the data at thespecified GEN_ID; and each indexer in the plurality of indexersreceiving a query request for the data at the specified GEN_ID, whereinthe primary indexer for the data at the specified GEN_ID responds, andeach secondary indexer for the data at the specified GEN_ID ignores therequest.
 2. The system of claim 1, further comprising: a master devicewithin the cluster having one or more processors that perform actions,including: assigning the generation identifier (GEN_ID) to the data; andwhen the primary indexer for the GEN_ID is determined to benon-responsive, selecting at least one of the secondary indexers tobecome a new primary indexer for the data; and incrementing the GEN_IDfor the data, such that the new primary indexer is primary for the dataat the incremented GEN_ID, while the other secondary indexers remainsecondary for the data at the incremented GEN_ID.
 3. The system of claim2, further comprising: a searcher operating on one or more processorsthat perform actions, including: receiving a request for data stored byindexers within the cluster; querying the master device to obtain aGEN_ID for the data; and sending the request for data with the obtainedGEN_ID to each of the indexers.
 4. The system of claim 1, wherein theprimary indexer performs further actions, including: generating thejournal to be provided to at least one secondary indexer.
 5. The systemof claim 1, wherein the replication factor is based on one of a type ofdata to be replicated, or based on information from a forwarder device.6. (canceled)
 7. The system of claim 1, wherein the primary indexerperforms actions, further comprising: determining whether the data hasbeen received and saved by the primary indexer or at least one secondaryindexer; based on the determination that the data has been received andsaved, sending the Acknowledgement (ACK) message to the forwarder deviceproviding the data; and when an ACK is not received by one of thesecondary indexers, selecting to resend the data to the non-responsivesecondary indexer or selecting a new secondary indexer to replace thenon-responsive secondary indexer.
 8. A cluster architecture, comprising:a plurality of indexers, each indexer having one or more processors thatperform actions, including: receiving, by a selected indexer, data froma forwarder device operating within a client device requesting the databe replicated for backup at a specified replication factor, the selectedindexer being designated as a primary indexer for the data at aspecified generation identifier (GEN_ID) for the data, wherein when anAcknowledgement message is not sent by the selected indexer to theforwarder device indicating that the selected indexer has received andstored the data, the forwarder device selects a different indexer as theselected indexer and resends the data to the different indexer; sending,by the primary indexer, to a number of other indexers the data and ajournal that includes metadata useable to recreate the data, the numberof other indexers being defined by the replication factor and each ofthe other indexers being designated as secondary indexers for the dataat the specified GEN_ID; and each indexer in the plurality of indexersreceiving a query request for the data at the specified GEN_ID, whereinthe primary indexer for the data at the specified GEN_ID responds to therequest and each secondary indexer for the data at the specified GEN_IDignores the request; and a master device operating on at least oneprocessor to perform actions, including: assigning the GEN_ID to thedata; and when the primary indexer for the data at the GEN_ID isdetermined to be non-responsive, incrementing the GEN_ID for the dataand selectively assigning at least one of the secondary indexers asprimary for the data at the incremented GEN_ID.
 9. The clusterarchitecture of claim 8, further comprising: a master device thatoperates on one or more processors and performs actions, including:receiving a request from a searcher for data stored by the indexers; andin response providing the searcher with a GEN_ID for the data to be usedby the searcher in requesting the stored data.
 10. The clusterarchitecture of claim 8, wherein: when it is determined that the masterdevice is non-responsive to a request from a searcher requesting aGEN_ID for data to be retrieved from the indexers, enabling the searcherto use an earlier GEN_ID to obtain the data.
 11. (canceled)
 12. Thecluster architecture of claim 8, wherein at least one indexer isconfigured to store data into buckets that are designated as hot, warm,or cold based in part on an activity of updating of the buckets, andwherein the at least one indexer is further configured to manage atleast one bucket of data designated as primary for one set of data, andat least one other bucket of other data designated as secondary for theother data.
 13. The cluster architecture of claim 8, wherein the masterdevice is configured to broadcast changes in the GEN_ID to each of theplurality of indexers.
 14. The cluster architecture of claim 8, whereinthe forwarder device is one of a plurality of forwarder devices, eachforwarder device operating independent of each other forwarder device toindependently select an indexer as a primary indexer for data sent tothe cluster architecture by the respective forwarder device.
 15. Acomputer-based method operating one or more processors within a clusterarchitecture, the method comprising: receiving, by a selected indexerwithin a plurality of indexers, data from a forwarder device thatoperates on at least one processor within a client device that isseparate from the plurality of indexers, the selected indexer beingdesignated as a primary indexer for the data at a specified generationidentifier (GEN_ID) for the data, wherein when an Acknowledgementmessage is not sent by the selected indexer to the forwarder deviceindicating that the selected indexer has received and stored the data,the forwarder device selects a different indexer as the selected indexerand resends the data to the different indexer; sending, by the selectedindexer, to a number of other indexers the data and a journal includingmetadata useable to recreate the data, the number of other indexersbeing determined from a replication factor defined based in part on atype of the data, and wherein the number of other indexers are eachdesignated as secondary indexers for the data at the specified GEN_ID;and each indexer in the plurality of indexers receiving a query requestfor the data at the specified GEN_ID, wherein the primary indexer forthe data at the specified GEN_ID responds, and each secondary indexerfor the data at the specified GEN_ID ignores the request.
 16. Thecomputer-based method of claim 15, further comprising: employing amaster device having one or more processors that perform actions,including: assigning the generation identifier (GEN_ID) to the data;when the primary indexer for the GEN_ID is determined to benon-responsive, selecting at least one of the secondary indexers tobecome a new primary indexer for the data; and incrementing the GEN_IDfor the data, such that the new primary indexer is primary for the dataat the incremented GEN_ID, while the other secondary indexers remainsecondary for the data at the incremented GEN_ID.
 17. The computer-basedmethod of claim 15, wherein the forwarder device is one of a pluralityof forwarder devices, each forwarder device operating independent ofeach other forwarder device to independently select an indexer as aprimary indexer for data sent to the cluster architecture by therespective forwarder device.
 18. The computer-based method of claim 15,wherein the replication factor is based on one of a type of data to bereplicated, or based on information from a forwarder device.
 19. Thecomputer-based method of claim 15, wherein at least one indexer isdesignated as a secondary indexer for the data and a primary indexer fordifferent data.
 20. The computer-based method of claim 15, furthercomprising: wherein the query request is received from a searcher thatoperates on at least one processor, and wherein the searcher queries amaster device to request the GEN_ID for the data being requested in thequery request; and when the master device is determined to benon-responsive to the request for the GEN_ID, enabling the searcher toemploy a prior GEN_ID for the data being requested.